Libris Britannia 4

home *** CD-ROM | disk | FTP | other *** search

/ Libris Britannia 4 / science library(b).zip / science library(b) / MATHEMAT / STATISTI / 2845.ZIP / TS_PD.DOC < prev next >

Wrap

Text File | 1991-08-22 | 49KB | 1,107 lines

TURBOSTATS Survey Analysis System ================================= M. C. Hart 698 Uppingham Road Thurnby Leicestershire LE7 9RN Contents : ======== General Introduction ... Page 1 How does TURBOSTATS work ? ... Page 2 Brief description of the TURBOSTATS modules ... Page 4 Running TURBOSTATS ... Page 6 Description of the individual TURBOSTATS modules : TS-FREQ1 ... Page 7 TS-CROSS ... Page 9 TS-STATS ... Page 13 TS-ENTRY ... Page 17 TS-CASES ... Page 17 TURBOSTATS utilities SD (Sorted Directory) ... Page 18 SNAPSHOT ... Page 18 Interfacing with Graphics ... Page 18 TURBOSTATS capacities ... Page 19 DO's and DON'Ts ! ... Page 20 Version 2.01 Issued : October, 1989 Public domain version : September, 1991 Page 1 GENERAL INTRODUCTION ==================== TURBOSTATS is the name given to a suite of programs designed to work with each other in the analysis of survey data. Each program may be run as a 'stand-alone' program or as part of an integrated system. The TURBOSTATS system is closely modelled upon the SPSS (Statistical Package for the Social Sciences) statistical package and is designed to give output similar to that offered by the SPSS 'Frequencies' and 'Crosstabulations' commands. The analysis of survey material tends to fall into the following categories : (i) the counts and percentages of the various values taken by a single variable ( e.g. those replying 'Yes, 'No' or 'Do not Know' in response to a survey question). From this we can form a FREQUENCY DISTRIBUTION. (ii) the formation of tables typically involving two variables known as CONTINGENCY or CROSS-TABULATION tables. For example, we could have table with a variable SEX subdivided into 'Male and 'Female' on one axis whilst the other axis might be a variable INCOME subdivided into 'High' and 'Low'. The CONTINGENCY TABLE would display the numbers of cases that fall into each of the resulting 'cells' as well as computing other relevant statistics. (iii) hypothesis tests designed to measure whether the mean of one variable or sub-group in the data differs significantly from that of another variable or sub- group in the data. Another form of hypothesis test might be to discover whether, in a contingency table, the type of newspaper read differs by sex, for example. Page 2 HOW DOES TURBOSTATS WORK ? ======================== In order to function, the TURBOSTATS modules require two files of data : (i) a data file consisting of numbers, separated from each other by spaces, commas or semi-colons. Such a file is often known as a CSV (Comma Separated Value) file e.g. 1,2,3 1,4,5 2,6,2 .. etc. (ii) a labels file which will supply names for the individual variables ( e.g. SEX,PAPER) and labels for the individual values that each variable might take. For example SEX would typically have labels of 'Male' and 'Female' whilst PAPER might have 'Quality', 'Tabloid','Sunday' etc. These files can be created in several ways. For fairly small surveys ( e.g. 100 cases or less) you could use the TS-ENTRY module. For larger surveys, it might be more cost- effective in terms of time to input data using dBASE III and to create data files with the dBASE III command : COPY TO filename.ext DELIMITED It is also possible to create the data and labels files by using your favourite word-processor or text-editor ( e.g. WordStar in non-document mode) In the latter case,though, you would not have the benefit of any error-checking or correction facilities. A labels file might look like the following : "SEX","Sex of Individual" "SEX","Male" "SEX","Female" "CLASS","Social Class" "CLASS","Professional" "CLASS","Intermediate" "CLASS","Skilled Manual" "CLASS","Semi-skilled Manual" "CLASS","Unskilled Manual" "CLASS","Pensioners" "CLASS","Not classified" "PAPER","Newspaper read" "PAPER","None" "PAPER","Quality" "PAPER","Middle-brow" "PAPER","Tabloid" .. etc. Page 3 The TURBOSTATS system will assume that the first variable name encountered in a labels file will relate to the first column of data found in a data file. Similarly, the second variable found will relate to the second column of data and so on. Care should be taken to ensure that the variable names match up with the various columns of numbers as TURBOSTATS has no way of 'knowing', other than by position in a list, which variable name matches up with which column of data. The labels work in a similar fashion. Once the TURBOSTATS system has identified the 'starting point' in the labels file, then it is assumed that: - the first entry will be a label which expands upon the name of the variable (known as a VARIABLE LABEL) For example, the variable name FINCOME might be a variable which you might wish to label as 'Fathers income'. - each subsequent label relates to the various values taken by the variable and consequently is known as a VALUE LABEL. The labels should cover the range from the minimum to the maximum values of that variable likely to be encountered in the data set. In this respect, TURBOSTATS does not differ materially from the SPSS philosophy. Care should be exercised to ensure that variable names match up with the appropriate columns. Missing Values ~~~~~~~~~~~~~~ A problem with all survey material is what to do with those cases where, for a variety of reasons, the question has not been completed. For example, a question on 'Father's Income' cannot be answered if the respondents father is dead or if the income is unknown. In such cases, the survey analyst assigns a 'MISSING VALUE' number to such cases e.g. the number 0,9 or -1 as long as it is integer (i.e. whole number) In subsequent analyses, TURBOSTATS will request MISSING VALUE code numbers and use these to exclude data from further analysis (although typically reporting the number of cases that fall into the MISSING VALUES category). Page 4 BRIEF DESCRIPTION OF THE TURBOSTATS MODULES =========================================== The TURBOSTATS system provides three modules which are designed to analyse survey data (TS-FREQ1,TS-CROSS and TS- STATS) and a further two to aid the entry and editing of data files (TS-ENTRY,TS-CASES). In addition, utilities are provided to provide sorted directories and to capture screen outputs into files for subsequent processing in reports. Provision is also made for the access of your favourite spreadsheet package if you wish to process your data in a graphical form. Each of these will now be described briefly : TS-FREQ1 provides for the frequency distribution of the values in a single variable measured at the nominal level. This is the module best used to analyse the patterns of response to a single question. The output consists of counts, percentages and a simple bar-chart. It is also possible to save results in a file should you wish to import these later into a graphics package for further analysis. TS-CROSS provides for contingency tables of two variables measured at the nominal level. This is the module best used to examine the operation of two variables together ( e.g. sex and newspaper readership) At its simplest, TS-CROSS provides simple counts for the number of cases that will fall into each 'cell' but it can also generate the column percentages, row percentages, total percentages, expected values and chi-square values for each cell in the table. TS-STATS is the module which can provide for the more specialised statistical information required on either of one or two variables. If two variables are specified then a range of bi-variate statistics are also calculated including the correlation coefficient, the regression equation and the 't-test' for the differences in means. It is also possible to use this module to perform 't-tests' i.e. tests of statistical significance on sub-groupings within a variable upon request. For example, it would be possible to discover whether the mean income for 'Females' might differ from the mean income for 'Males' in a data set. It is also possible to display histograms of variables and a scatterplot of the joint distribution of two variables. Page 5 TS-ENTRY is the module that is used to create the files for : (i) variable and label names (ii) the input of (numerical) data. A labels file needs to be created first in order that the variable names can supply prompts for the various values before the input of numerical data. To simplify the operation of TS-ENTRY, the module is not designed to alter or modify existing label files. If the modifications are minor, this is best achieved using your usual word-processor/text editor - in the event of major modifications, you would be well advised to create a brand-new labels file in any case. TS-CASES is a module which creates sub-files of your data for more detailed analysis. For example, you could create a file containing only 'Males' so that you can then examine relationships further within the data that relate only to 'Males' A utility is provided that enables you to view a sorted directory, operated from the principal menu, should you forget a filename. This utility also gives you the file size and an indication of the space free upon your disk. Provision is also made for you to load the spreadsheet of your choice (e.g. the LOTUS 1-2-3 clone ASEASYAS) in order to access the advanced graphics capacities of such a package. Page 6 RUNNING TURBOSTATS ================== To run the TURBOSTATS system is really quite simple. (1) If you are installing the system for the first time on a hard disk, then copy all of the files on the disk over to a subdirectory of your choice. Then run the TS-INSTL program. (2) If you are running the program from either a floppy or a hard disk, then you may run the whole integrated system with the command TS [Dr A:] (where Dr A: represents the drive upon which you would like the 'screensnap' files to be stored) or you may run any of the programs by name directly i.e. TS-MENU (Menu and loader program) TS-FREQ1 (Frequencies) TS-CROSS (Cross-Tabulations) TS-STATS (Statistics of one or two variables) TS-ENTRY (Label and data entry) TS-CASES (Creates sub-files of data) (3) If you run the integrated system TS then a batch file is loaded which will make the screen capture program (SNAP.EXE) memory resident and remind you how it is to be activated. Make a note of the command that is necessary to 'snap' your screen pictures : i.e. PRTSC Subsequently, when the program terminates, the batch file will run another program (DEVELOP.EXE) which will 'develop' your screen snaps into files named SNAPSHOT.01..SNAPSHOT.30. Make sure that you have sufficient space on disk to hold your snapshots : each will take a maximum of 2000 bytes. If you have 'old' snapshots on disk ( i.e. SNAPSHOT.01 .. SNAPSHOT.30) then rename these to another name (e.g. OLDSNAP.01 .. OLDSNAP.30) before you start a new session as otherwise the SNAP.EXE program will overwrite the old 'SNAPSHOTS' found on your disk. Page 7 DESCRIPTION OF THE INDIVIDUAL TURBOSTATS MODULES ================================================ TS-FREQ1 ======== Sample input screens : ~~~~~~~~~~~~~~~~~~~~ TS-FREQ1 TURBOSTATS (c) M.C. Hart [1989] ~~~~~~~~ ~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~ Performs frequency counts,barcharts of raw (nominal) data.. Name of raw data file ? mysurvey.txt Name of labels file ? labels.txt ------------------------------------------------------------------- TS-FREQ1 TURBOSTATS File: MYSURVEY.TXT ~~~~~~~~ ~~~~~~~~~~ Performs frequency counts,barcharts of raw (nominal) data.. Variable List - [Y]es or [N]o .. [X] to exit ID SEX CLASS PAPER Variable ? sex Missing Values should be integers in the range -32768..32767 e.g. [0] [9] [-1] [ 0 by default ] Missing Values 9 ------------------------------------------------------------------- Sample output screen : ~~~~~~~~~~~~~~~~~~~~ SEX Sex of Individual File: MYSURVEY.TXT Valid Cum Value Label Value Frequency Percent Percent Percent Male 1 136 50.4 51.5 51.5 Female 2 128 47.4 48.5 100.0 9 6 2.2 MISSING ------- ------- ------- TOTAL 270 100.0 100.0 Male ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 136 Female ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 128 Valid Cases 264 Missing Cases 6 Page 8 The important point to remember about TS-FREQ1 is that is designed to deal only with categorical (nominal) data. This is data in which numbers 'stand for' categories in the data rather than being regarded as entities in their own right. We would not wish to perform statistical operations upon such numbers for they are essentially 'labels' or 'flags' that indicate different categories of the variable under consideration. If, in a random survey, we named a variable SEX and coded 'Female' as 1 and 'Male' as 2 then we could count up the numbers of '1s' (i.e. Females) and '2s' (i.e. Males) and also perform such calculations as the percentage each contributes to the total. But if we had 50 cases of 'Male' and 50 cases of 'Female', it would not make sense to average the numbers ( to produce a mean of 1.5) because the numbers are essentially meaningless. The frequencies values should be in the range of 1-20 and work best if the ranges are 0-8 (with 9 for missing values) or 1-9 ( with 0 used for missing values) Be careful to specify the exact drive and filename of your data and label files. If you ignore the entension, then TURBOSTATS will assume that you intend a file with the .TXT extension and will add this extension automatically to your filename. A barchart is generated automatically but on the next 'page' or 'screen' if space is limited. Press the ENTER key to get the next page of output. This instruction is NOT shown on screen in order to keep the screen free of instructions should you wish to capture output for a subsequent report. You also have the chance to save your output in an output file and should follow the system prompts carefully, making sure that your filename is a legitimate MS-DOS filename i.e. 1-8 characters with no embedded spaces and with an extension e.g. a:myfile.txt Page 9 TS-CROSS ======== Sample input screens : ~~~~~~~~~~~~~~~~~~~~ TS-CROSS TURBOSTATS (c) M.C. Hart [1989] ~~~~~~~~ ~~~~~~~~~~ Constructs contingency tables from raw (nominal) data.. Variable List - [Y]es or [N]o .. [X] to exit ID SEX CLASS PAPER First variable ? sex Second variable ? class Missing Values should be integers in the range -32768..32767 e.g. [0] [9] [-1] [ 0 by default ] Missing Values 9 ------------------------------------------------------------------- The data is now entered.. In the contingency table, you have a choice of options as well as the cell counts These are [1] Row percentages [2] Column percentages [3] Total percentages [4] Expected values [5] Chi-square statistic If you want to choose the option, then give the OPTION number when prompted. Options will be printed in the order you specify.. Specify 0 if you do NOT want the option .. First Choice [Option No] 1 Second Choice [Option No] 2 Third Choice [Option No] 4 Fourth Choice [Option No] 5 Fifth Choice [Option No] 0 -------------------------------------------------------------------- Page 10 TS-CROSS ======== Sample output screen : ~~~~~~~~~~~~~~~~~~~~ Crosstabulation of SEX Sex of Individual File: MYSURVEY.TXT By CLASS Social Class CLASS >│Profes Interm Skille Semi-s Unskil Pensio Not cl│ ROW │sional ediate d Manu killed led Ma ners assifi│TOTAL SEX │ 1 2 3 4 5 6 7 │ │──────┼──────┼──────┼──────┼──────┼──────┼──────┼ Male 1│ 24 │ 17 │ 15 │ 27 │ 33 │ 4 │ 26 │ 146 [Row %] │ 16.4 │ 11.6 │ 10.3 │ 18.5 │ 22.6 │ 2.7 │ 17.8 │51.4% [Col %] │ 57.1 │ 77.3 │ 40.5 │ 50.9 │ 50.0 │ 15.4 │ 68.4 │ [Exp ] │ 21.6 │ 11.3 │ 19.0 │ 27.2 │ 33.9 │ 13.4 │ 19.5 │ [Chis ] │ 0.3 │ 2.9 │ 0.9 │ 0.0 │ 0.0 │ 6.6 │ 2.1 │ │──────┼──────┼──────┼──────┼──────┼──────┼──────┼ Female 2│ 18 │ 5 │ 22 │ 26 │ 33 │ 22 │ 12 │ 138 [Row %] │ 13.0 │ 3.6 │ 15.9 │ 18.8 │ 23.9 │ 15.9 │ 8.7 │48.6% [Col %] │ 42.9 │ 22.7 │ 59.5 │ 49.1 │ 50.0 │ 84.6 │ 31.6 │ [Exp ] │ 20.4 │ 10.7 │ 18.0 │ 25.8 │ 32.1 │ 12.6 │ 18.5 │ [Chis ] │ 0.3 │ 3.0 │ 0.9 │ 0.0 │ 0.0 │ 6.9 │ 2.3 │ │──────┼──────┼──────┼──────┼──────┼──────┼──────┼ TOTAL 42 22 37 53 66 26 38 284 14.8% 7.7% 13.0% 18.7% 23.2% 9.2% 13.4% 100.0% Valid cases = 284 Missing = 16 Total chi-square D.F. Significance Cells with E.F. < 5 26.161 6 0.0002 0 of 14 ( 0.0% ) Page 11 Contingency tables also require two variables measured at the nominal (categorical) level. The output is designed so that a maximum of NINE columns may be displayed horizontally on the screen. If your data contains more than nine categories, it may be unnecessarily complex in any case and consideration should be given to collapsing the categories so that there is a maximum of nine. Several options are given as as well as the cell counts which are always supplied. These are : - Column % (Proportion the cell contributes to the column total) - Row % (Proportion the cell contributes to the row total) - Total % (Proportion the cell contributes to the overall total) - Expected The value expected in each cell if the proportion of the row totals are applied to the relevant column totals (i.e. there is no relationship between the two variables) - Chi-square A value calculated from the formula : (Observed - Expected)² -------------------- Expected which is then totalled to produce a total chi-square ( often designated as X²) The 'p' value is the probability of chi- square occuring by chance and will take a value between 0 and 1. An output of p=0.05 means that there there is only a 5% chance (1 in 20) that the association found in the data could have occurred by chance alone. The 5% level is the conventional 'significance level' used to test a statistical hypothesis. A value of p=0.0000 means a probability of 5 in 100,000 or less i.e. practically zero. Remember that a LOW 'p' value indicates that it is likely that the variables are significantly related and vice versa. Page 12 Special case of a 'single value' column or row ---------------------------------------------- Under these circumstances, a normal contingency table is not possible. However, TS-CROSS will sense this special case and produce a 'GOODNESS OF FIT' test. For example, if we had the following data : PAPERS Quality Tabloid The Rest TOTAL SEX=1 (Male) 40 30 30 100 (Expected) 33.3 33.3 33.3 Notice that TS-CROSS has taken the 100 cases and calculated the expected probabilities by assuming that they will be evenly distributed ( i.e. a third or 33.3% in each cell) before calculating the appropriate chi-square. Page 13 TS-STATS ======== Sample output screens : ~~~~~~~~~~~~~~~~~~~~~ File: MYSURVEY.TXT SEX CLASS Measures of Central Tendency ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Mean 1.478 4.108 Median 1.000 4.000 Mode 1.000 5.000 Measures of Dispersion ~~~~~~~~~~~~~~~~~~~~~~ Minimum 1.000 1.000 Maximum 2.000 7.000 Range 1.000 6.000 First Quartile 1.000 3.000 Third Quartile 2.000 6.000 Semi-Interquartile Range 1.000 3.000 Variance 0.250 3.567 Stan.dev [pop-n] 0.500 1.889 Stan.dev [sample] 0.500 1.892 S.E.Mean 0.029 0.112 Measures of Distribution Shape ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Skewness 0.088 -0.183 Kurtosis -1.999 -0.965 ----------------------------------------------------------------- File: MYSURVEY.TXT SEX CLASS Numbers of Cases ~~~~~~~~~~~~~~~~ N 293 287 Missing Values 7 13 N (valid pairs) 284 Summary Statistics ~~~~~~~~~~~~~~~~~~ Σx, Σy 433 1179 Σx²,Σy² 713 5867 Σx, Σy (adjusted : pair-wise deletion) 422 1161 Σx²,Σy² (adjusted : pair-wise deletion) 698 5759 Σxy 1740 Bi-variate Statistics ~~~~~~~~~~~~~~~~~~~~~ Correlation r = 0.0554 t = 0.932 p = 0.352 Regression y (CLASS ) = 3.777 + 0.209 * x (SEX ) T-Test (difference in means) t = 22.785 D.F. = 325.04 p = 0.000 Page 14 'T'-test : Sample input and output screens: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Perform a t-test on the variables [Y]es [N]o It is necessary to divide the variable SEX into two groups to perform the t-test Minimum of Group 1 1 Maximum of Group 1 1 Name you wish to give to Group 1 [8 characters or less] Males Minimum of Group 2 2 Maximum of Group 2 2 Name you wish to give to Group 2 [8 characters or less] Females ------------------------------------------------------------------- Twosample test of SEX by CLASS File: MYSURVEY.TXT SEX N MEAN STDEV SE MEAN Group 1 Males 153 4.216 2.221 0.180 Group 2 Females 140 4.264 1.845 0.156 T-Test (difference in means) t = 0.204 D.F. = 288.37 p = 0.8382 Page 15 Histogram of CLASS Minimum of CLASS is 1.0 Histogram minimum ? 1 Maximum of CLASS is 7.0 Histogram maximum ? 7 No of classes in the histogram [2-20] ? 7 Histogram of CLASS Social Class File: MYSURVEY.TXT CLASSES COUNT PERCENT 7.0 38 13.2% ************ 6.0 29 10.1% ********* 5.0 66 23.0% ********************** 4.0 53 18.5% ***************** 3.0 37 12.9% ************ 2.0 22 7.7% ******* 1.0 42 14.6% ************** -------------- Total 287 100.0% Missing Cases 13 ---------------------------------------------------------------------------- Plot of CLASS against PAPER r= 0.0378 File: MYSURVEY.TXT ┌─────────────────────────────────────────────────────────────────┐ 8.0 │ * * * * * * * │ │ │ │ * * * * * * │ │ │ │ * * * * * │ │ │ │ * * * * * * * │ │ │ PAPER │ * * * * │ │ │ │ * * * * * * │ │ │ │ * * * * │ │ │ │ * * * * │ │ │ │ * * * * * * │ 0.0 │ │ └─────────────────────────────────────────────────────────────────┘ 1.0 CLASS Social Class 7.0 Page 16 TS-STATS will produce the range of 'univariate' statistics on either one or two variables. If two variables are specified, then in addition to the univariate statistics, the following bivariate statistics are also produced : - correlation coefficient (r) which measures the strength of the relationship between the two variables. The correlation coefficient (technically known as Pearson's r) may take a value that lies beween 0 and 1. Note that correlation cannot be taken to imply causation. A t-test and probability for the correlation coefficient are also calculated. - regression equation in which the equation of a 'line of best fit' is calculated for the data. The regression equation allows one to predict the values for the dependent variable ( = y ) if given the value of the independent variable ( = x ) For further details of correlation and regression, consult a standard statistical textbook. - a t-test to test whether or not there is a statistical difference between the means. If required, a 't-test' may be performed which allows one to take the categories of one variable ( e.g. 1='Male' and 2='Female' in a variable named SEX) and calculate whether or not there is a statistical difference between the two groups with respect to the other variable chosen. You will be prompted for maximum and minimum values to facilitate dividing one variable into two sub-groups. If you have several categories that are not contiguous, then you will probably have to reorder the data in your original data file ( as well as amending the corresponding label files) Facilities are also available to view histograms and scatterplots. In the case of histograms, the minimum and maximum of each variable will be shown and you are free to accept these or to substitute others of your own. Then you will be asked to suggest the number of classes (i.e. divisions) in the data. You will be well advised to choose categories that are consistent with the data e.g. if the minimum and maximun are 1 and 7 respectively then choose 7 classes, rather than 10. A simple scatterplot is also available on request. Note that the correlation coefficient between the two variables is displayed but that TURBOSTATS does not distinguish between multiple plots at the same screen location. Page 17 TS-ENTRY ======== This module is used to create variable names, variable labels and value labels as well as entering the raw data itself. These terms are also used in SPSS but are defined and illustrated below : VARIABLE NAME A name of 1-8 characters from the set [A..Z,0..9,_,-] VARIABLE LABELS A brief label ( up to 25 characters ) which may be used to amplify the meaning of the necessarily brief variable name itself. e.g. INCOME could have the label of "Anticipated Annual Salary VALUE LABELS A brief description of each value that a variable may take ( up to 15 characters only) Brief variable names may be preferable to long variable names as under certain circumstances the variable label is truncated (i.e. cut down) to some eight characters. This is most likely in happen in TS-STATS when there are nine columns horizontally across the screen. The operation of TS-ENTRY is self-explanatory and you generally have an opportunity to correct errors in both the label entry and the data entry sections. If you wish to amend the label files that you have already created, this is best done with your usual word-processor/text editor. TS-CASES ======== This module is used to create sub-files of data from your original data set. For example, you could choose to have a file which contains only 'Males' or alternatively a file which excludes 'Males'. The module is self-explanatory in operation. Generally, you will wish to 'include' the values of the variable that you have chosen in your new sub-file. However, it is possible that you wish to create a file which contains all of the values of the variable EXCEPT the ones that you have indicated and in this case you would choose to EXCLUDE those values from your new sub-file. Do remember to choose a different name for your new sub-file! Page 18 TURBOSTATS UTILITIES ==================== SD (Sorted Directory) ~~~~~~~~~~~~~~~~~~~~ SD is a simple utility which is available from the principal menu and gives a sorted directory. The size of each file is specified in bytes and there is also an indication of the amount of free space available on the disk. SNAP.EXE Capturing screen output ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ An especially written utility (SNAP.EXE) is provided and this is made memory-resident to enable 'snaps' to be taken of the screen. To 'snap' a picture then press PRTSC. (The 'normal' function of this key i.e. to provide screen dumps on the printer will be restored later by the DEVELOP.EXE program) This will record a picture of the screen in memory and later the DEVELOP.EXE will 'develop' these pictures into files named SNAPSHOT.01..SNAPSHOT.30. These files may be printed out or read into other documents if it is wished to incorporate them into other reports. You should also ensure that you have a disk (usually in Drive A:) with sufficient space for each screen snap which will take a maximum of 2000 bytes each. Interfacing with Graphics ~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are some limited plotting capabilities provided by TURBOSTATS but it is possible to complement these with the graphics facilities available in public domain/shareware programs such as the LOTUS 1-2-3 'clone' 'AS-EASY-AS'. Provision is made on the main menu for you to load the package of your own choice. The assumption here is that the relevant parts of the package are available on your default drive. Page 19 TURBOSTATS CAPACITIES ===================== Number of cases ~~~~~~~~~~~~~~~ TS-FREQ1 and TS-CROSS 7500 cases TS-STATS 2000 cases Number of variables ~~~~~~~~~~~~~~~~~~~ For technical reasons, an input line from your data file may only be 254 characters in length. Remembering that a position is occupied by each delimiter ( e.g. a space or a comma), then TURBOSTATS can accomodate 127 variables of length 1 (e.g. 1,2,3) 84 variables of length 2 (e.g. 10,12,14) 62 variables of length 3 (e.g. 123,456,6.7) If you have a large data set, then consider splitting your whole project into two or more files, ensuring that in each file you keep together those variables that you wish to cross-tabulate or correlate. Number of variable/value labels ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All modules 300 lines of text Number of variable/value labels processed by the TS-ENTRY module 200 lines of text Page 20 DO's and DON'TS ! =============== DO : ~~ (1) Take great care that your labels file matches up EXACTLY with your data file. Your two files should match up as in the example below : "SEX","Sex of Individual" │ 1,2,1 "SEX","Male" ├──────────┐ 1,1,2 "SEX","Female" │ │ 2,3,2 "YEAR","Year of Course" │ │ 1,2,1 "YEAR", "First Year" │ └───┘ │ │ "YEAR","Second Year" ├─────────────┘ │ "YEAR","Third Year" │ │ "DRIVER","Holds Driving Licence"│ │ "DRIVER","Can drive" ├────────────┘ "DRIVER","Cannot drive" │ (2) Ensure that the type of data that you have is appropriate for the module that you are using to analyse the data. The following table should clarify the position : ┌─────────────────────────────────────────┬───────────────┐ │ TYPES OF DATA │ MODULE │ ├─────────────────────────────────────────┼───────────────┤ │ Nominal (Categorical) data : │ │ │ ~~~~~~~~~~~~~~~~~~~~~~~~~~ │ │ │ Integers typically in the range 1-9 │ TS-FREQ1 │ │ used as answers to questions .. │ TS-CROSS │ │ │ TS-STATS │ ├─────────────────────────────────────────┼───────────────┤ │ Interval OR Ratio data │ │ │ ~~~~~~~~~~~~~~~~~~~~~~ │ │ │ May be large numbers which may │ TS-STATS │ │ contain a decimal place. An example │ only! │ │ would be a figure for a salary (e.g. │ │ │ 9500) or a height (5.5 feet) │ │ └─────────────────────────────────────────┴───────────────┘ Page 21 (3) Make sure that your initial data file does not contain blank lines at the beginning or at the end of the file. Also it is important that the data in each line should be exactly as shown in (1) above, with no spaces between the data items, with the data items separated by a comma(,) and with each line terminated by a normal carriage return ( i.e. the CR/LF pair of bytes ) If the package 'locks up' after reading a datafile, then in all probability the cause will be found in a datafile which contains some of the errors mentioned above. Ensure that the labels file also contains no blank lines and that the number of value labels is consistent with the data set. In particular try to ensure a consistent spelling with the variable labels in upper case. (4) Take care with specifying your drive and MS-DOS filenames which should not contain embedded blanks or unconventional characters. A typical filename might be : a:myfile.txt Note : no spaces, filename of eight characters or less, extension specified. (5) Expand your knowledge by reading appropriate statistical texts if necessary. (6) USE the CTRL-BREAK keys to abort a module should you find that you have made an irrecoverable error and you wish to return to the principal menu. DO NOT : ~~~~~~ (1) Attempt to write to a disk which is full or write- protected (2) Use categories outside the range 1-9 ( or 0-8 ) in the modules TS-FREQ1 and TS-CROSS. Collapse your data if necessary so that you do not have more than nine categories in either direction in these two modules.